Dataset description
There are two submissions: 10267 & 10270.
- In each submission, 2390 families with .vcf files are included.
- For each family, two vcf files are provided,
- one named “sorted”.
- the other named “annotated”.
Submission 10267
- For files named “sorted”,
- 852 families without GL/PL information
- 1538 families with valid GL/PL information
- 310 Trios
- 1228 families with >=1 siblings
- For files named “annotated”,
- 1096 families without GL/PL information
- 1294 families with valid GL/PL information
- 309 Trios
- 985 families with >=1 siblings
Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.
Submission 10270
- For files named “sorted”, there is no GL/PL information.
- For files names “annotated”,
- 703 families without valid GL/PL information
- including 13 families with variants < 2000.
- 1687 families with valid GL/PL information
- 292 Trios
- 1395 families with >=1 siblings
Combined
Note that combing 10267 & 10270, there are 2206 families with complete vcf files.
- 415 Trios
- 1791 families with >=1 siblings
Call de novo mutations
Triodenovo was used to call de novo mutations:
- Only variants with GL/PL information were retained.
- Families were splitted to Parents-Offspring trios.
- Filters: --minDP 7 --minDepth 10 and other default options
- Post filters (referred to Homsy et al. 2015 Science):
- For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
- For parents: a minimum depth of 10 reference reads and alternate allele ratio <3.5%
The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R
Annotation
- ANNOVAR was used to annotate refGene and allele frequencies.
- hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
- Based on annotation, further filtered DNMS:
- exonic or canonical splice-site variant
- MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
- Gene-level pLI for PTVs was downloaded from ExAC
- MPC scores for missense variants were annotated using VEP.
DNMs summary
After applying filters, a total of 4222 DNMs were found in 1763 families with 2438 offsprings.
- 3386/4222 (80.2%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
- 274 trio-families (with 455 DNMs) and 1489 quads-families (with 3767 DNMs, including 1876 DNMs in 1120 probands and 1891 DNMs in 1044 siblings).
- 3617 DNMs in 2081 males and 605 DNMs in 357 females.
- 2331 DNMs in 1394 probands and 1891 DNMs in 1044 siblings.
- 2808 DNMs were not presented in ExAC, 2900 DNMs were not presented in gnoMad, 2593 DNMs were not presented in both datasets.
DNMs in quads-familiy
- A total of 3767 DNMs were observed in 1489 quads-families
- 1876 DNMs in 1120 probands and 1891 DNMs in 1044 siblings
- 3282 DNMs in 1892 males and 485 DNMs in 272 femals.
Burden test analysis